To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers $13$ common text generation tasks and their corresponding $83$ datasets and further incorporates $45$ PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement $4$ efficient training strategies and provide $4$ generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link:
translated by 谷歌翻译
Recent work on 4D point cloud sequences has attracted a lot of attention. However, obtaining exhaustively labeled 4D datasets is often very expensive and laborious, so it is especially important to investigate how to utilize raw unlabeled data. However, most existing self-supervised point cloud representation learning methods only consider geometry from a static snapshot omitting the fact that sequential observations of dynamic scenes could reveal more comprehensive geometric details. And the video representation learning frameworks mostly model motion as image space flows, let alone being 3D-geometric-aware. To overcome such issues, this paper proposes a new 4D self-supervised pre-training method called Complete-to-Partial 4D Distillation. Our key idea is to formulate 4D self-supervised representation learning as a teacher-student knowledge distillation framework and let the student learn useful 4D representations with the guidance of the teacher. Experiments show that this approach significantly outperforms previous pre-training approaches on a wide range of 4D point cloud sequence understanding tasks including indoor and outdoor scenarios.
translated by 谷歌翻译
Sky-image-based solar forecasting using deep learning has been recognized as a promising approach in reducing the uncertainty in solar power generation. However, one of the biggest challenges is the lack of massive and diversified sky image samples. In this study, we present a comprehensive survey of open-source ground-based sky image datasets for very short-term solar forecasting (i.e., forecasting horizon less than 30 minutes), as well as related research areas which can potentially help improve solar forecasting methods, including cloud segmentation, cloud classification and cloud motion prediction. We first identify 72 open-source sky image datasets that satisfy the needs of machine/deep learning. Then a database of information about various aspects of the identified datasets is constructed. To evaluate each surveyed datasets, we further develop a multi-criteria ranking system based on 8 dimensions of the datasets which could have important impacts on usage of the data. Finally, we provide insights on the usage of these datasets for different applications. We hope this paper can provide an overview for researchers who are looking for datasets for very short-term solar forecasting and related areas.
translated by 谷歌翻译
Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We developed a novel loss function that is tailored to reflect the boundary information to enhance the boundary detection. As the contrast between segmentation and background regions along the classification boundary naturally induces heterogeneity over the pixels, we propose the piece-wise two-sample t-test augmented (PTA) loss that is infused with the statistical test for such heterogeneity. We demonstrate the improved boundary detection power of the PTA loss compared to benchmark losses without a t-test component.
translated by 谷歌翻译
In recent years, Multi-Agent Path Finding (MAPF) has attracted attention from the fields of both Operations Research (OR) and Reinforcement Learning (RL). However, in the 2021 Flatland3 Challenge, a competition on MAPF, the best RL method scored only 27.9, far less than the best OR method. This paper proposes a new RL solution to Flatland3 Challenge, which scores 125.3, several times higher than the best RL solution before. We creatively apply a novel network architecture, TreeLSTM, to MAPF in our solution. Together with several other RL techniques, including reward shaping, multiple-phase training, and centralized control, our solution is comparable to the top 2-3 OR methods.
translated by 谷歌翻译
医学视觉和语言预训练提供了一种可行的解决方案,可以从医学图像和文本中提取有效的视觉和语言表示。但是,很少有研究专门研究该领域,以促进医学视觉和语言理解。在本文中,我们提出了一种自我监督的学习范式,该学习范式使用多模式掩盖的自动编码器(M $^3 $ ae),通过从随机掩盖的图像和文本中重新构造缺失的像素和代币来学习跨模式域知识。有三个关键设计可以使这种简单的方法起作用。首先,考虑到视觉和语言的不同信息密度,我们为输入图像和文本采用不同的掩蔽比,其中将较大的掩模比用于图像。其次,我们使用来自不同层的视觉和文本特征来执行重建,以处理视觉和语言中不同级别的抽象。第三,我们为视觉和语言解码器开发了不同的设计(即,视觉的变压器和语言的多层感知器)。为了进行全面的评估并促进进一步的研究,我们构建了包括三个任务的医学视觉和语言基准。实验结果证明了我们方法的有效性,在所有下游任务上都取得了最新的结果。此外,我们进行进一步的分析,以更好地验证方法的不同组成部分和预训练的各种设置。源代码可在〜\ url {}中获得。
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
微小的机器学习(Tinyml)是严格限制的资源和功率的AI工作负载,是一个重要而挑战性的话题。该简介首先提出了一个极其微小的主链,用于为各种视觉任务构建高效率CNN模型。然后,专门设计的神经协调员(NCP)与MCU互连以构建一个超低功率Tinyml系统,该系统将所有功能和权重存储在芯片上,并完全消除芯片内存储器访问中的延迟和功耗。此外,进一步提出了一个特定的指令集,以实现敏捷开发和快速部署。广泛的实验表明,基于我们的模型,NCP和指令集的提议的Tinyml系统可产生相当大的精确性,并在30FPS以实现对象检测和识别时实现了创纪录的160MW的超低功率。演示视频可在\ url {}上获得。
translated by 谷歌翻译
translated by 谷歌翻译